A Statistical Exploration of Himalayan Expeditions

INFO 526 Final Project Proposal

Analysis of Himalayan expeditions from 2020–2024

Author
Affiliation

Trevor Macdonald & Nandakumar Kuthalaraja

School of Information, University of Arizona

Abstract

High-altitude mountaineering offers a natural laboratory for studying how human, environmental, and logistical factors interact to shape risk and achievement. Using tidy versions of the Himalayan Database, this project leverages R (tidyverse, tidy models) and Quarto to build an interactive, reproducible data-visualisation narrative around two guiding questions:

  • What factors contribute to the success or failure of a summit?

    • We knit together expedition, peak, and temporal attributes to explore how season, route choice, peak height, team size, leader nationality, weather windows, and oxygen strategy correlate with summit outcomes. Visual encodings include faceted ridge plots of summit probability across seasons, interactive geographic representations of route popularity versus success.

    • A multilevel logistic model provides effect-size estimates while accounting for peak-level heterogeneity.

  • Does an expedition’s funding affect safety outcomes?

    • Because direct cost metrics are absent, we construct a funding proxy that blends sponsorship type, total members & hired staff, oxygen usage, and expedition duration.

    • We then relate this composite to safety indicators like member and staff fatalities, high-altitude illnesses, rescue events via scatter-plot matrices, concentration curves, and Poisson regression. Stratified visualizations highlight differential risk patterns between commercial and self-organised teams.

Preliminary findings suggest that

- seasonality and route normalization remain the strongest predictors of summit success

- larger, commercially sponsored teams enjoy slightly higher success but only marginal safety advantages once altitude and weather are controlled

- resource intensive strategies such as extensive oxygen deployment reduce high-altitude illness yet do not eliminate fatal risk.

Introduction

This project looks at data from many Himalayan mountain climbing trips. We use two main files: one has details about each expedition (called exped_tidy.csv), and the other has information about the mountains themselves (peaks_tidy.csv). The expedition data covers hundreds of trips over many years. For each trip, we know things like which mountain was climbed, how many people were in the team, if they used oxygen, if it was a commercial (guided) trip, and whether the team reached the summit or not.

The mountain (peaks) data gives us extra facts, like each mountain’s name, height, and what region it is in. By putting these two datasets together, we can explore how things like team size, the use of oxygen, type of trip, and which mountain was climbed affect the chances of reaching the summit.

Question 1: What factors contribute to the success or failure of a summit?

  • Analyze expedition outcomes using both expedition and peak variables. “Success” will be defined based on the `TERMREASON_FACTOR`, with specific values representing a successful summit.

  • Use factors like – geography, team size, year, and commercial vs non-commercial expeditions

Introduction

Climbing a Himalayan peak is a big challenge, and not every team that tries will reach the top. Some expeditions succeed, while others turn back before the summit. But what makes the difference? In this project, we want to understand what factors help climbers succeed and what might make them fail.

We have used following factors:

Explore Team Size
Effect of Geography
Explore by Year
Explore Commercial Factor

Approach

  • Get Data from the CSV files
Code
exped <- read_csv("data/exped_tidy.csv")
peaks <- read_csv("data/peaks_tidy.csv")

colnames(peaks)
head(peaks)
colnames(exped)
head(exped)

table(exped$SUCCESS1, useNA = "ifany")
  • Build Model & Variable Inferences
Code
table(exped$SUCCESS1, useNA = "ifany")
exped2 <- exped |>
  mutate(
    success = ifelse(SUCCESS1 %in% c("Y", "Yes", "yes"), 1,
                     ifelse(SUCCESS1 %in% c("N", "No", "no"), 0, as.numeric(SUCCESS1))),
    peak = as.factor(PEAKID),
    team_size = as.numeric(TOTMEMBERS),
    oxygen = ifelse(O2USED %in% c("Y", "Yes", "yes"), 1,
                    ifelse(O2USED %in% c("N", "No", "no"), 0, as.numeric(O2USED))),
    season = as.factor(SEASON)
  ) |>
  filter(!is.na(success), !is.na(team_size), !is.na(oxygen)) 

model <- glm(
  success ~ peak + team_size + oxygen + season + YEAR,
  data = exped2,
  family = "binomial"
)
  • Start plotting various charts building up to the case

Start

The Overall Success vs Failure Rate out of all expeditions.

Code
exped2 |>
  mutate(outcome = ifelse(success == 1, "Success", "Failure")) |>
  group_by(outcome) |>
  summarize(n = n()) |>
  mutate(rate = n / sum(n)) |>
  ggplot(aes(x = outcome, y = rate, fill = outcome)) +
    geom_col(width = 0.5) +
    scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("Success" = "forestgreen", "Failure" = "firebrick")) +
    labs(
      title = "Overall Expedition Success vs Failure Rate",
      x = "Expedition Outcome",
      y = "Proportion"
    ) +
    theme_minimal()

We will study with following variables:

Potential Factors to Study Success Rate
Explore Team Size
Effect of Geography
Explore by Year
Explore Commercial Factor

Explore Team Size

Code
ggplot(exped2, aes(team_size)) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white") +
  facet_wrap(~ success, labeller = labeller(success = c(`0` = "Failure", `1` = "Success"))) +
  labs(
    x     = "Team Size",
    y     = "Count",
    title = "Distribution of Team Sizes\nfor Success vs. Failure"
  )

Code
ggplot(exped2, aes(x = team_size, fill = factor(success))) +
  geom_bar(position = "fill") +
  scale_x_continuous(
    breaks = seq(0, max(exped2$team_size, na.rm = TRUE), by = 10),
    expand = expansion(add = c(0, 0))
  ) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    x     = "Team Size",
    y     = "Proportion",
    fill  = "Success\n(1 = yes)",
    title = "Proportion of Success vs. Failure\nby Team Size"
  ) +
  theme(
    axis.text.x = element_text(angle = 90, vjust = 0.5),
    plot.title  = element_text(hjust = 0.5)
  )

Code
df_bin <- exped2 |>
  group_by(team_size) |>
  summarize(rate = mean(success), n = n(), .groups="drop") |>
  filter(n > 3)

ggplot(df_bin, aes(x = team_size, y = rate)) +
  geom_segment(aes(xend = team_size, y = 0, yend = rate), color = "grey70") +
  geom_point(size = 2) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    x     = "Team Size",
    y     = "Success Rate",
    title = "Lollipop Chart of Success Rate by Team Size",
    caption = "Source: Himalayan Database (2020–2024)"
  )

Explore Team Size Analysis

These plots suggests a non-linear relationship between team size and summit success:

  • Very small teams (1–5 climbers) tend to have very high success rates.

    Probable cause - agility, lighter logistical needs, and tighter decision-making.

  • Small-to-medium teams (3–8 climbers) show a noticeable dip in success.

    Probable cause - enough complexity to introduce friction but not enough manpower to provide robust support.

  • Medium-large teams (9–20 climbers) exhibit steadily increasing success rates, peaking around 20 members at or near 100%.

    Probable cause - have enough members to spread tasks, establish well-stocked camps without yet suffering the full burden of very large-group logistics.

  • Very large teams (20+) see a gradual decline in success.

    Probable cause - supplies, campsite congestion, decision latency.

Effect of Geography

Code
peak_success <- exped2 |>
  left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
  group_by(PEAKID, PKNAME, HEIGHTM) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 10)

ggplot(peak_success, aes(x = HEIGHTM, y = success_rate, label = PKNAME)) +
  geom_point(color = "steelblue", size = 3, alpha = 0.7) +
  geom_text(vjust = -0.8, size = 2.5, check_overlap = TRUE) +
  labs(
    title = "Altitude vs. Success Rate Across Peaks",
    x = "Altitude (meters)",
    y = "Success Rate"
  ) + geom_smooth(method = "loess", se = FALSE, color = "darkred")

Code
peak_success <- exped2 |>
  left_join(peaks |> select(PEAKID, PKNAME, HEIGHTM), by = "PEAKID") |>
  group_by(PKNAME, HEIGHTM) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 10)

ggplot(peak_success, aes(x = reorder(PKNAME, HEIGHTM), y = HEIGHTM, color = success_rate, size = n)) +
  geom_point(alpha = 0.8) +
  scale_color_gradient(low = "red", high = "green") +
  labs(
    title = "Altitude and Success Rate Across Peaks",
    x = "Peak",
    y = "Altitude (meters)",
    color = "Success Rate",
    size = "Number of Expeditions"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1)
  )

Effect of Geography Analysis

Geography, in particular the altitude band of a peak—turns out not to be a simple “the higher you go, the harder it gets” story.

  • Low-altitude peaks (~6800 m, e.g. Ama Dablam) enjoy very high success rates, thanks to gentler slopes, shorter approaches and straightforward rope-fixed “ladder” routes

  • Mid-altitude peaks (~7100–7300 m, like Pumori and Baruntse) dips to the bottom of the curve,mountains are often more technical, less commercialized, with fewer fixed ropes and smaller teams—so even though they’re “lower,” they prove tougher

  • Higher Peaks (8000 m + like Everest) enjoys very high success rates, as they benefit from massive infrastructure (high-traffic base camps, helicopter support etc.), which more than offsets the physiological challenge of extreme altitude

Overall, a peak’s infrastructure and traffic are at least as critical to success as its height.

Explore by Year

Code
yearly_success <- exped2 |>
  group_by(YEAR) |>
  summarize(
    success_rate = mean(success == 1, na.rm = TRUE),
    n = n()
  ) |>
  filter(n > 1)

ggplot(yearly_success, aes(x = YEAR, y = success_rate)) +
  geom_line(color = "steelblue", size = 1.2) +
  geom_point(color = "darkred", size = 2) +
  geom_smooth(method = "loess", se = FALSE, color = "forestgreen", linetype = "dashed") +
  labs(
    title = "Expedition Success Rate by Year (with Trend)",
    x = "Year",
    y = "Success Rate"
  ) +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal()

Code
peak_year <- exped2 |>
  left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
  group_by(YEAR, PKNAME) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 3)

# Plot heatmap
ggplot(peak_year, aes(x = YEAR, y = reorder(PKNAME, -success_rate), fill = success_rate)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "red", high = "green") +
  labs(
    title = "Success Rate by Year and Peak",
    x = "Year",
    y = "Peak",
    fill = "Success Rate"
  ) +
  theme_minimal()

Explore by Year Analysis

These plots suggests, despite a pandemic-driven slump in 2021, overall expedition success has rebounded strongly by 2024

  • 2020 average success rate ~ 72%

    Probable cause – pre-pandemic normalcy, full staffing, established logistics.

  • 2021-2022 average success rate fell to ~ 63%

    Probable cause – COVID travel restrictions & teams adapted to new health protocols and smaller expedition sizes.

  • 2023-2024 average success rate rebound to ~ 80%

    Probable cause – full operational recovery, improved high-altitude gear, accumulated guiding experience.

Commercial Factor

Code
exped2 <- exped2 |>
  mutate(
    commercial = case_when(
      COMRTE == TRUE ~ "Commercial",
      COMRTE == FALSE ~ "Non-Commercial",
      TRUE ~ NA_character_
    )
  )

comm_year <- exped2 |>
  filter(!is.na(commercial)) |>
  group_by(YEAR, commercial) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 3)

ggplot(comm_year, aes(x = YEAR, y = success_rate, color = commercial)) +
  geom_line(size = 1.1) +
  geom_point() +
  scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray40")) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Success Rate by Year: Commercial vs Non-Commercial",
    x = "Year",
    y = "Success Rate",
    color = "Expedition Type"
  ) +
  theme_minimal()

Code
exped2 |>
  filter(!is.na(commercial)) |>
  left_join(peaks |> select(PEAKID, PKNAME), by = "PEAKID") |>
  group_by(PKNAME, commercial) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 5) |>
  ggplot(aes(x = reorder(PKNAME, -success_rate), y = success_rate, fill = commercial)) +
    geom_col(position = "dodge") +
    coord_flip() +
    scale_fill_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
    labs(
      title = "Success Rate by Peak and Expedition Type",
      x = "Peak",
      y = "Success Rate",
      fill = "Type"
    ) +
    theme_minimal()

Code
exped2 |>
  filter(!is.na(commercial)) |>
  group_by(team_size = TOTMEMBERS, commercial) |>
  summarize(success_rate = mean(success == 1, na.rm = TRUE), n = n()) |>
  filter(n > 3, !is.na(team_size)) |>
  ggplot(aes(x = team_size, y = success_rate, color = commercial)) +
    geom_point(size = 2, alpha = 0.7) +
    geom_smooth(se = FALSE) +
    scale_color_manual(values = c("Commercial" = "skyblue", "Non-Commercial" = "gray70")) +
    scale_y_continuous(labels = scales::percent) +
    labs(
      title = "Team Size vs Success Rate by Expedition Type",
      x = "Team Size",
      y = "Success Rate",
      color = "Expedition Type"
    ) +
    theme_minimal()

Commercial Factor Analysis

These plots suggests, A professional, well-resourced support structure not only accelerates recovery from external shocks (like a pandemic) but also delivers uniformly higher success rates—regardless of mountain or team size.

  • Commercial expeditions rebound faster and to a higher plateau

    In 2021, both types dipped (COVID impact), but commercial climbs dropped only to ~67%, whereas non-commercial fell to ~53%. By 2024, commercial success surged to ~95 %, while non-commercial languished around ~64%.

  • Across individual peaks, company-run trips consistently outshine independent ones

    On major peaks (Everest, Lhotse, Ama Dablam), commercial success rates hover in the 85–95 % range. Non-commercial attempts on the same mountains are 5–10 % lower.

  • Team-size effects differ by expedition type

    Commercial: success stays high (75–100 %) across all team sizes, peaking around 20 members.
    Non-commercial: very small groups (> 80 %) and very large groups (~75 %) do better, but mid-sized teams (5–10 climbers) dip down near 50 %

Question 2

The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, FUNDING_TYPE.

We will Compare specific funding types like commercial vs. independent expeditions across countries for more insights

NOTE: Add “real world” examples of expedition success or failure to make analysis relatable.

Approach

  • Get Data from the CSV files
Code
exped <- read_csv("data/exped_tidy.csv")
  • Data Wrangling & Variable Inferences
Code
# Lean tibble of variables of interest
exped_q2 <- exped |>
  select(
    YEAR,           # Year of the expedition
    SPONSOR,        # Sponsor or funding organization
    SUCCESS1,       # Success on primary route
    SUCCESS2,       # Success on second route
    SUCCESS3,       # Success on third route
    SUCCESS4,       # Success on fourth route
    TERMDATE,       # Date expedition was terminated
    TERMREASON,     # Numeric code for termination reason
    TERMREASON_FACTOR, # Description of termination reason
    TOTMEMBERS,     # Total members in the team
    MDEATHS,        # Number of member deaths
    HDEATHS,        # Hired staff deaths
    OTHERSMTS,      # Other summits during expedition
    ACCIDENTS,      # Description of accidents
    AGENCY,         # Trekking/expedition agency
    O2USED          #
  ) |>
  mutate(
    ANY_SUCCESS = SUCCESS1 | SUCCESS2 | SUCCESS3 | SUCCESS4,
    FATALITIES = (MDEATHS + HDEATHS > 0),
    OUTCOME_LABEL = case_when(
      ANY_SUCCESS & !FATALITIES ~ "Success, No Deaths",
      ANY_SUCCESS & FATALITIES  ~ "Success + Deaths",
      !ANY_SUCCESS & FATALITIES ~ "Failure + Deaths",
      TRUE                      ~ "Failure, No Deaths"
    )
  ) |>
  select(-SUCCESS1, -SUCCESS2, -SUCCESS3, -SUCCESS4, -MDEATHS, -HDEATHS)


exped_q2 <- exped_q2 |> 
  mutate(
    TERMINATION_TYPE = case_when(
      TERMREASON %in% c(1, 2, 3) ~ "Successful",
      TERMREASON %in% c(4, 5) ~ "Bad Weather/Conditions",
      TERMREASON %in% c(6, 7) ~ "Accident/Illness/Exhaustion",
      TERMREASON %in% c(8, 9, 10) ~ "Logistics/Technical",
      TERMREASON %in% c(11, 12, 13) ~ "No Attempt/Base Only",
      TERMREASON == 14 ~ "Other",
      TRUE ~ "Unknown"
    )
  )
Code
library(stringr)

# Source: https://stackoverflow.com/questions/59082243/multiple-patterns-for-string-matching-using-case-when

sponsor_prefix_tbl <- exped_q2 |> 
  mutate(
    SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
    SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
    SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*")
  ) |> 
  count(SPONSOR_PREFIX, sort = TRUE)

#Sponsors <- exped_q2 |> 
  #distinct(SPONSOR_CLEAN) #|> 
 #count()

exped_q2 <- exped_q2 |> 
  mutate(
    SPONSOR_CLEAN = str_to_lower(str_trim(SPONSOR)),
    SPONSOR_CLEAN = str_remove(SPONSOR_CLEAN, "\\s*\\b20\\d{2}\\b$"),
    SPONSOR_PREFIX = str_extract(SPONSOR_CLEAN, "^\\w+\\s*\\w*"),
    FUNDING_TYPE = case_when(
      str_detect(SPONSOR_PREFIX, "army|police|military|defense") ~ "Military",
      str_detect(SPONSOR_PREFIX, "university|college|school|club|jac|univ") ~ "Academic/Alpine Club",
      str_detect(SPONSOR_PREFIX, "guides|treks|adventure|travel|ascent|trip|climbalaya|madison|imagine|kobler|makalu|highland|satori|elite|glacier|dream|thamserku") ~ "Commercial",
      str_detect(SPONSOR_PREFIX, "private|individual|self|solo|1st|friends|father|jon|jost|alex|kilian|marc|kishori|soren") ~ "Private/Individual",
      str_detect(SPONSOR_PREFIX, "national geographic|japanese|korean|slovakian|french|chinese|indian|nepalese|russian|german|austrian|canadian|italian|spanish|swiss|american|latvian") ~ "National Program",
      is.na(SPONSOR_PREFIX) | SPONSOR_PREFIX == "" ~ "Other/Unknown",
      TRUE ~ "Other/Unknown"
    )
  )|> 
  mutate(
    FUNDING_SIMPLIFIED = case_when(
      FUNDING_TYPE == "Commercial" ~ "Commercial",
      FUNDING_TYPE %in% c("Academic/Alpine Club", "National Program", "Military") ~ "Non-commercial",
      FUNDING_TYPE %in% c("Private/Individual", "Other/Unknown") ~ "Independent/Other"
    )
  )

#glimpse(exped_q2)

Motivation & Setup

Does Funding Type Affect Expedition Safety?

Purpose

Commercial expeditions are often seen as safer due to better logistics, guides, and resources.

The following analysis aims to address this assumption by comparing commercial, private, and non-commercial groups.

How Sponsors Were Categorized

To determine funding source the data was wrangled and sorted using the “SPONSOR” column. This was accomplished by repeatedly editing the sorted list. The sponsor column were nearly all unique. I found an example here:https://stackoverflow.com/questions/59082243/multiple-patterns-for-string-matching-using-case-when

Summary Statistics on Safety

Code
# Summarize safety metrics by funding type
library(dplyr)
library(knitr)

exped_q2 |>
  group_by(FUNDING_SIMPLIFIED) |>
  summarise(
    n_expeditions = n(),
    n_success = sum(ANY_SUCCESS),
    n_fatalities = sum(FATALITIES),
    pct_success = mean(ANY_SUCCESS),
    pct_fatal = mean(FATALITIES),
    .groups = "drop"
  ) |>
  mutate(
    pct_success = scales::percent(pct_success, accuracy = 0.1),
    pct_fatal = scales::percent(pct_fatal, accuracy = 0.1)
  ) |>
  kable(
    caption = "Expedition Outcomes by Funding Type (2020–2024)",
    col.names = c(
      "Funding Type",
      "Total Expeditions",
      "Number Successful",
      "Number with Fatalities",
      "Success Rate",
      "Fatality Rate"
    ),
    format = "html"
  )

Let’s Start the plotting

How do Expeditions End?

These figures shows that proportionally, accidents are more common with None-commercial expeditions. Termination due to bad weather appears uniform across all groups, which is expected.

Code
ggplot(exped_q2, aes(
  x = FUNDING_SIMPLIFIED,
  fill = TERMINATION_TYPE
)) +
  geom_bar(position = "fill") + 
  scale_fill_brewer(
  palette = "YlGnBu",
  direction = 1,
  name = "Termination reason"
) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    x = NULL,
    y = NULL, 
    fill = "Termination reason",
    title = "How do expeditions end?",
    subtitle = "By funding type, Himalayan expeditions 2020–2024",
    caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(margin = margin(b = 6), size = 8),
    plot.caption = element_text(color = "gray40", size = 6),
    legend.position = "right",
    panel.grid.major.x = element_blank()
  )

Code
ggplot(exped_q2, aes(x = FUNDING_SIMPLIFIED, fill = OUTCOME_LABEL)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_brewer(palette = "YlGnBu", direction = 1) +
  labs(
    title = "Expedition Outcomes by Funding Source",
    subtitle = "Independent groups have more fatal or failed climbs",
    x = NULL,
    y = "Proportion of Expeditions",
    fill = "Outcome"
  )

Code
ggplot(exped_q2, aes(x = FUNDING_SIMPLIFIED, fill = OUTCOME_LABEL)) +
  geom_bar(position = "fill") +
  scale_fill_brewer(palette = "YlGnBu", direction = 1) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    x = NULL,
    y = "Proportion of expeditions",
    title = "Distribution of Expedition Outcomes by Funding Type",
    subtitle = "Includes success and death combinations for Himalayan expeditions (2020–2024)",
    fill = "Outcome",
    caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    legend.position = "right",
    panel.grid.major.x = element_blank()
  )

Fatality Trend

Code
exped_q2 |> 
  group_by(YEAR, FUNDING_SIMPLIFIED) |> 
  summarise(fatality_rate = mean(FATALITIES, na.rm = TRUE), .groups = "drop") |> 
  ggplot(aes(x = YEAR, y = fatality_rate, color = FUNDING_SIMPLIFIED)) +
  geom_line(size = 1.2) +
  geom_smooth(method = "loess", se = FALSE, linewidth = 1, linetype = "dashed") +
  scale_color_brewer(palette = "YlGnBu", direction = 1) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Trend in Member Fatalities by Funding Type",
    subtitle = "Fatality rate = expeditions with ≥1 member or hired death",
    x = "Year",
    y = "Fatality Rate (%)",
    color = "Funding Type",
    caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
  ) +
  theme_minimal(base_size = 14)

It’s one thing to say commercial expeditions are safer but is that always true? I wanted to check whether fatality rates have changed over time, and if so, whether some groups are improving faster than others. This smoothed line chart shows the share of expeditions with fatalities by year and funding type. This is overlap with question 1.

Team Size vs Fatality

Next question: does team size make a difference? You’d think larger teams might be safer due to support and redundancy, but also possibly riskier if they’re less experienced or move slowly. This scatterplot checks whether total team size correlates with success, failure, or fatalities — and whether that varies by funding type.

Code
ggplot(exped_q2, aes(x = TOTMEMBERS, y = OUTCOME_LABEL, color = FUNDING_SIMPLIFIED)) +
  geom_jitter(width = 0.3, height = 0.2, alpha = 0.6, size = 2) +
  scale_color_brewer(palette = "YlGnBu", direction = 1) +
  labs(
    x = "Total team members",
    y = "Outcome type",
    title = "Team Size and Expedition Outcome by Funding Type",
    subtitle = "Each point is one expedition (2020–2024)",
    color = "Funding Type",
    caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
  ) +
  theme_minimal(base_size = 14)

Team Size and Expedition Outcome by Funding Type

Team Size by Funding As Proxy for Safety Resources

Large commercial teams often come with more resources: guides, oxygen, fixed ropes, medical support. That could partly explain why they have lower fatality rates.

Meanwhile, small non-commercial teams might:

  • Take harder routes (fewer sherpa or porters)

  • Push limits with less redundancy

  • Face more exposure if things go wrong

So while team size alone doesn’t determine safety, it’s a proxy for resourcing — and resourcing links back to safety.

Code
library(ggridges)

exped_q2 |> 
  filter(TOTMEMBERS < 50) |>  # Trim extreme outliers for better scaling
  ggplot(aes(
    x = TOTMEMBERS,
    y = FUNDING_SIMPLIFIED,
    fill = FUNDING_SIMPLIFIED
  )) +
  geom_density_ridges(
    scale = 1.2,
    rel_min_height = 0.01,
    alpha = 0.6,
    color = "white"
  ) +
  scale_fill_brewer(palette = "YlGnBu", direction = 1) +
  labs(
    x = "Team Size (Total Members)",
    y = "Funding Type",
    title = "Distribution of Team Size by Funding Type",
    subtitle = "Commercial expeditions tend to bring more people",
    caption = "Source: Himalayan Database via TidyTuesday (2025-01-21)"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none")